{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Neural Networks - Part 2\n",
    "\n",
    "2016-09-16, Josh Montague\n",
    "\n",
    "## The Plan\n",
    "\n",
    "- Quick review of [Part 1](https://github.com/DrSkippy/Data-Science-45min-Intros/tree/master/neural-networks-101)\n",
    "- The library stack (Keras, Theano, Numpy, oh my!)\n",
    "- Examples!\n",
    "    - Classification (Iris) \n",
    "    - Classification (MNIST)\n",
    "    - Regression (housing)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Review\n",
    "\n",
    "Back in [Part 1, we looked at some history, motivation, and a simple (if only *mostly*-working 😐) implementation of a neural network](https://github.com/DrSkippy/Data-Science-45min-Intros/tree/master/neural-networks-101). \n",
    "<img src=\"img/NN-2.jpeg\">\n",
    "\n",
    "Recall the short version of how this worked:\n",
    "- there is an array of input \"nodes\" (one per feature) \n",
    "- the input nodes are \"fully-connected\" to an arbitrary number of nodes in the next, \"hidden layer\"\n",
    "- the value of each of the hidden nodes is computed by taking the inner product of the previous layer with the weights matrix, and then passing that linear combination through an \"activation function,\" $f(\\omega^T x)$. We introduced the sigmoid as one possible activation (shown above)\n",
    "- when there are many nodes in the hidden layer(s), the weights form a matrix; the weight connecting nodes $i$ and $j$ (in sequential layers) is matrix element $w_{ij}$\n",
    "- \"forward propagation\" through the network is repeating this for each layer in the network until you get to your predicted output layer\n",
    "- \"backpropagation\" is the process of updating the weight matrix elements $\\omega_{ij}$ by distributing the prediction error backward through the network according to the prediction error and a chosen loss function\n",
    "- forward and backward propagation are repeated a bunch of times until some convergence criteria is achieved\n",
    "\n",
    "Remember that at least one of the reasons why this is an interesting set of techniques to explore is that they a very different way to think of features in a model. We don't have to specify all of the explicit model features in a data matrix e.g. a column for $x$, a column for $x^2$, and $x*y, x*y^2$, and so on. We're defining a structure that allows for stacking of arbitrary, non-linear combinations of the predefined set of data matrix features; this can lead to a more expressive set of features. On the other hand, it also means many more degrees of freedom, which increases computational complexity and decreases interpretability.\n",
    "\n",
    "Moving beyond our ``for`` loop in Part 1, we can look at some more reasonable approaches to using neural networks in practice! In particular, we'll look at [Keras](https://keras.io/), one of the active and growing libraries for building, training, and using neural networks in Python.\n",
    "\n",
    "I think it'll be helpful to understand the stack of libraries and their roles, so hang tight while we run through that, first..."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Keras\n",
    "\n",
    "Keras is a modular library with a ``scikit-learn``-inspired API. It lets you write readable Python code to define the structure of a neural network model, as well as (optionally) detailed configuration of *how* the model should evaluate. \n",
    "\n",
    "From the [docs](https://keras.io/):\n",
    "\n",
    "> Keras is a minimalist, highly modular neural networks library, written in Python and capable of running on top of either TensorFlow or Theano. It was developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research.\n",
    "> \n",
    "> Use Keras if you need a deep learning library that:\n",
    "> \n",
    "> - allows for easy and fast prototyping (through total modularity, minimalism, and extensibility).\n",
    "> - supports both convolutional networks and recurrent networks, as well as combinations of the two.\n",
    "> - supports arbitrary connectivity schemes (including multi-input and multi-output training).\n",
    "> - runs seamlessly on CPU and GPU.\n",
    "\n",
    "There are many libraries for creating neural networks in Python. A quick google search includes: \n",
    "\n",
    "- Keras \n",
    "- TensorFlow\n",
    "- PyBrain\n",
    "- Blocks\n",
    "- Lasagne\n",
    "- Caffe\n",
    "- nolearn\n",
    "- PyML\n",
    "- ... and I'm sure there are more\n",
    "\n",
    "I read the docs for a few, read some reddit and StackOverflow discussions, and asked some practioners that I know for their opinions. My takeaway: **if you're working in Python, familiar with ``scikit-learn``, and want a good on-ramp to neural networks, Keras is a good choice.** \n",
    "\n",
    "For more discussion about the library and it's motivations, check out [the recent Quora Q&A](https://www.quora.com/session/Fran%C3%A7ois-Chollet/1) in which the lead developer gave some great insight into the design and plans for the library.\n",
    "\n",
    "Most of this session will involve writing code with Keras. But Keras doesn't actually do the computation; it uses another library for that (in fact, more than one). For the symbolic computation portion, Keras currently supports [Theano](http://www.deeplearning.net/software/theano/) (the default) and [TensorFlow](https://www.tensorflow.org/). We'll use Theano for this notebook. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Theano\n",
    "\n",
    "From [the docs](http://deeplearning.net/software/theano/introduction.html#introduction):\n",
    "\n",
    "> Theano is a Python library that lets you to define, optimize, and evaluate mathematical expressions, especially ones with multi-dimensional arrays (``numpy.ndarray``). Using Theano it is possible to attain speeds rivaling hand-crafted C implementations for problems involving large amounts of data.\n",
    "\n",
    "Essentially, by using symbolic mathematical expressions, all sorts of compiler and computational  optimizations (including automatic differentiation and dynamically-generated C code!), Theano can make math happen very fast (either using the Python interpreter and ``numpy``, or going right around it to CPU/GPU instructions). An interesting feature of Theano is that executing the same code with a GPU is achieved by simply setting a shell environment variable!\n",
    "\n",
    "One way to think about how these pieces relate to one another is (loosely): \n",
    "\n",
    "```\n",
    "scikit-learn:numpy :: keras:theano(+numpy)\n",
    "```\n",
    "\n",
    "Put another way, here's my attempt at a visual version:\n",
    "\n",
    "<img src=\"img/nn-stack.png\">\n",
    "\n",
    "\n",
    "\n",
    "# Ok, enough talk, let's build something"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "from IPython.display import Image\n",
    "Image(data=\"img/mr-t.jpg\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "seed = 1234; np.random.seed(seed)\n",
    "import seaborn as sns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "from keras.models import Sequential\n",
    "from keras.layers.core import Dense, Activation\n",
    "\n",
    "from sklearn.cross_validation import train_test_split\n",
    "from sklearn.linear_model import LogisticRegression\n",
    "\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Foreword: \"`random`\"\n",
    "\n",
    "Many of the numerical computations that we do in Python involve sampling from distributions. We often want these to be truly random, but computers can only provide so-called \"pseudo-random\" numbers ([wiki](https://en.wikipedia.org/wiki/Pseudorandom_number_generator), [Python docs](https://docs.python.org/3/library/random.html)).\n",
    "\n",
    "In many of our modeling use-cases (particularly those which are *ad hoc*), having fluctuations in some pseudo-random values is fine. However, when there are variations in, say, the initial conditions for an algorithm, it can lead to variation in the outcome which is not indicative or representative of any true variance in the underlying data. Examples include choosing the starting centroids in a k-means clustering task, or *choosing the weights of a neural network synapse matrix*. \n",
    "\n",
    "When you want results to be reproducible (which is generally a good thing!), you have to \"seed\" the random number generator (RNG). In this way, when you send your code to someone else's computer, or if you run your code 10,000 times, you'll always have the same initial conditions (for parameters that are randomly generated), and you should always get the same results.\n",
    "\n",
    "In a typical Python script that runs all in one session, the line above (`seed = 1234; np.random.seed(seed)`) can be included once, at the top\\*. In an IPython notebook, however, it seems that you need to set the seed in each cell where the random parameter initialization may occur (i.e. in any cell that includes the declaration of a NN model). I'm not 100% positive about this, but this is what I gathered from my experimentation. This is the origin of the assorted calls to `np.random.seed()` you'll see below!\n",
    "\n",
    "\n",
    "\n",
    "# 1: Classification (Iris)\n",
    "\n",
    "To get a sense for how Keras works, we'll start with a simple example: the golden oldie, the iris data set. That way we can focus our attention on the code, not the details of the data.\n",
    "\n",
    "Furthermore, to illustrate the parallels with `scikit-learn`, let's run through *that* demo first. Since the Keras API is ``sklearn``-like (and this team has lots of ``sklearn`` experience), hopefully that will provide some helpful conceptual hooks.\n",
    "\n",
    "## ``sklearn``"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# import data (from seaborn, bc it gives you a df with labels)\n",
    "iris = sns.load_dataset(\"iris\")\n",
    "\n",
    "iris.tail()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# inspect\n",
    "sns.pairplot(iris, hue='species')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# get train/test split (no preprocessing) \n",
    "X = iris[['sepal_length', 'sepal_width', 'petal_length', 'petal_width']].values\n",
    "y = iris['species'].values\n",
    "\n",
    "# take a 75/25 split\n",
    "X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.75, random_state=seed)\n",
    "\n",
    "# verify array sizes\n",
    "#[x.shape for x in [X_train, X_test, y_train, y_test]]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# fit default LR model\n",
    "model = LogisticRegression()\n",
    "model.fit(X_train, y_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# score on test (should be ~80-90%)\n",
    "print(\"Accuracy = {:.2f}\".format(model.score(X_test, y_test)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Not bad for less than ten lines of code!\n",
    "\n",
    "In practice, we should be a bit more careful and consider some additional work:\n",
    "- preprocess the data (scaling, normalizing)\n",
    "- use cross validation techniques to build an uncertainty or confidence (e.g. k-fold cv)\n",
    "- gridsearch the model parameters \n",
    "- ... etc. ...\n",
    "\n",
    "But for now, we're just trying to show the comparison between the libraries, and this will do.\n",
    "\n",
    "Now, let's write the same kind of classification system in Keras!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## ``keras``\n",
    "\n",
    "**Warning!** *If you have a dataset the size of the iris data (tiny!), you probably shouldn't use a neural network in practice; instead, consider a model that is more interpretable. We're  using #tinydata here because it's simple and common.*\n",
    "\n",
    "### (One-hot) encode the labels\n",
    "\n",
    "We can start the same train- and test-split data arrays. But, we have to make a modification to the output data (labels). ``scikit-learn`` estimators transparently convert categorical labels e.g. strings like \"virginica\" and \"setosa\" into numerical values (or arrays). But we have to do that step manually for the Keras models. \n",
    "\n",
    "We want the model output to be a 3x1 array, where the value at each index represents the probability of that category (0, 1, or 2). The format of this training data, where the truth is 1 and all the other possible values are 0 is also known as a **one-hot encoding.** \n",
    "\n",
    "There are a few ways to do this:\n",
    "- ``pandas.get_dummies()`` (we'll use this one)\n",
    "- ``scikit-learn``'s ``LabelEncoder()``\n",
    "- Keras' ``np_utils.to_categorical()`` \n",
    "- ... or roll your own \n",
    "\n",
    "Here's an example of how ``pd.get_dummies()`` works:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# create a sample array with a few of each species from the original df\n",
    "species_sample = iris.groupby(by='species').head(3)['species']\n",
    "\n",
    "species_sample"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# get a one-hot-encoded frame from the pandas method\n",
    "pd.get_dummies(species_sample, prefix='ohe')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, instead of a single string label as our output (prediction), we have a 3x1 array, where each array item represents one of the possible species, and the non-zero binary value gives us the information we need. \n",
    "\n",
    "``scikit-learn`` was effectively doing this same procedure for us before, but hiding all of the steps that map the labels to the prediction arrays.\n",
    "\n",
    "Back to our original data: we can one-hot encode the y arrays that we got from our train-test split earlier, and can re-use the same X arrays."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# encode the full y arrays\n",
    "ohe_y_train = pd.get_dummies(y_train).values\n",
    "ohe_y_test = pd.get_dummies(y_test).values"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Define, compile the model\n",
    "\n",
    "Time to make our neural network model!\n",
    "\n",
    "Keras has an object-oriented syntax that starts with a ``model``, then adds ``layers`` and ``activations``.\n",
    "\n",
    "The ``Sequential`` model is the main one we care about - it assumes that you'll tell it a series of layers (and activations) that define the network. Subsequently, we add layers and activations, and then compile the model before we can train it.\n",
    "\n",
    "There is art and science to choosing how many hidden layers and nodes within those layers. We're not going to dive into that in this session (mostly because I don't yet know the answers!), so maintain your skepticism, but just sit with it for now."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# create a new model\n",
    "model = Sequential()\n",
    "\n",
    "# add layers\n",
    "# - the first hidden layer must specify the dimensions of the input layer (4x1, here)\n",
    "# - this adds a 10-node, fully-connected layer following the input layer\n",
    "model.add(Dense(10, input_dim=4))\n",
    "\n",
    "# add an activation to the hidden layer\n",
    "model.add(Activation('sigmoid'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For now, we'll stick to a 3-layer network: input, hidden, and output. \n",
    "\n",
    "The final, output layer needs to have three nodes since we have labels that are 3x1 arrays. So, our layers and sizes are: input (4 nodes), hidden (10 nodes), and output (3 nodes). \n",
    "\n",
    "\n",
    "At this point, I only have a small amount of guidance for choosing activation layers. See the notes at the end of the notebook for a longer discussion. Importantly, when we want our output values to be between 0 and 1, and to represent probabilities of our classes (summing to 1), we choosing the **softmax** activation function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# add the output layer, and a softmax activation\n",
    "model.add(Dense(3))\n",
    "model.add(Activation('softmax'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, we compile the model. This is where we can specify the optimizer, and loss function. \n",
    "\n",
    "Since we're using multi-class classification, we'll use the ``categorical_crossentropy`` loss function. This is [the advice that I was able to find](https://keras.io/getting-started/sequential-model-guide/#compilation) most often, but I need to learn more about decision criteria for both optimizers, and loss functions. They can have a big effect on your model accuracy."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "model.compile(optimizer='sgd', loss='categorical_crossentropy', metrics=[\"accuracy\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, we ``fit()`` the compiled model using the original training data, including the one-hot-encoded labels. \n",
    "\n",
    "The ``batch_size`` is how many observations are propagated forward before updating the weights (backpropagation). Typically, this number will be much bigger (the default value is 32), but we have a very tiny data set, so we artificially force this network to update weights with each observation (see the **Warning** above)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# keras uses the same .fit() convention\n",
    "model.fit(X_train, ohe_y_train, batch_size=1, nb_epoch=20, verbose=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can ``evaluate()`` our accuracy by using that method on the test data; this is equivalent to ``sklearn``'s ``score()``."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "loss, metrics = model.evaluate(X_test, ohe_y_test, verbose=0)\n",
    "\n",
    "# score on test (should also be ~80-90%)\n",
    "print(\"Accuracy = {:.2f}\".format(metrics))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Not bad!\n",
    "\n",
    "There are also ``sklearn``-like methods that return class assignment and their probabilities."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "classes = model.predict_classes(X_test, verbose=0)\n",
    "probs = model.predict_proba(X_test, verbose=0)\n",
    "\n",
    "print('(class) [ probabilities ]')\n",
    "print('-'*40)\n",
    "\n",
    "for x in zip(classes, probs):\n",
    "    print('({}) {}'.format(x[0],x[1]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Now, more compact...\n",
    "\n",
    "We walked through that in pieces, but here we can collect all of those steps together to see just how few lines of code it required (though remember that we did have the one additional step of creating one-hot-encoded labels)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "np.random.seed(seed)\n",
    "\n",
    "# instantiate the model\n",
    "model = Sequential()\n",
    "\n",
    "# hidden layer\n",
    "model.add(Dense(10, input_shape=(4,)))\n",
    "model.add(Activation('sigmoid'))\n",
    "\n",
    "# output layer\n",
    "model.add(Dense(3))\n",
    "model.add(Activation('softmax'))\n",
    "\n",
    "# set optimizer, loss fnc, and fit parameters \n",
    "model.compile(optimizer='sgd', loss='categorical_crossentropy', metrics=[\"accuracy\"])\n",
    "model.fit(X_train, ohe_y_train, batch_size=1, nb_epoch=20, verbose=0)\n",
    "\n",
    "# score on test set\n",
    "loss, metrics = model.evaluate(X_test, ohe_y_test, verbose=0)\n",
    "print(\"Accuracy = {:.2f}\".format(metrics))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Or - even more succinctly - we can build the same model but collapse the structure definition  because of Keras' flexible API..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "np.random.seed(seed)\n",
    "\n",
    "# move the activations into the *layer* definition\n",
    "model = Sequential([\n",
    "            Dense(10, input_dim=4, activation='sigmoid'),\n",
    "            Dense(3, activation='softmax'),\n",
    "            ])\n",
    "\n",
    "model.compile(optimizer='sgd', loss='categorical_crossentropy', metrics=[\"accuracy\"])\n",
    "model.fit(X_train, ohe_y_train, batch_size=1, nb_epoch=20, verbose=0)\n",
    "\n",
    "loss, metrics = model.evaluate(X_test, ohe_y_test, verbose=0)\n",
    "print(\"Accuracy = {:.2f}\".format(metrics))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Cool! It seems to work pretty well.\n",
    "\n",
    "\n",
    "### Peeking inside the model\n",
    "\n",
    "At this point, what *is* the ``model`` we created? In addition to it's network structure (layers with sizes and activation functions), we also have the weight matrices. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "for layer in model.layers:\n",
    "    print('name: {}'.format(layer.name))\n",
    "    print('dims (in, out): ({}, {})'.format(layer.input_shape, layer.output_shape))\n",
    "    print('activation: {}'.format(layer.activation))\n",
    "    # nb: I believe the second weight array is the bias term\n",
    "    print('weight matrix: {}'.format(layer.get_weights()))\n",
    "    print()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Saving the model\n",
    "\n",
    "If you're looking to save off a trained network model, these are most of the pieces that you'd need to save to disk. [Keras uses HDF5](https://keras.io/getting-started/faq/#how-can-i-save-a-keras-model) (sort of \"named, organized arrays\") to serialize trained models with a ``model.save()`` (and corresponding ``.load()``) method. \n",
    "\n",
    "If you're looking to save the *definition* of a model, but without all of the weights, you can write it out in simple JSON or YAML representation e.g. ``model.to_json()``."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 2: Classification (MNIST)\n",
    "\n",
    "Let's do one more familiar classification problem - last year's 4C dataset: the MNIST image labeling task. \n",
    "\n",
    "This time we will:\n",
    "- have more data (good!)\n",
    "- do a tiny bit of data normalization (smart!)\n",
    "- build a bigger network (more expressive!)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "from keras.datasets import mnist"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# the data, shuffled and split between tran and test sets\n",
    "(X_train, y_train), (X_test, y_test) = mnist.load_data()\n",
    "\n",
    "print(\"X_train original shape\", X_train.shape)\n",
    "print(\"y_train original shape\", y_train.shape)\n",
    "print(\"y_test original shape\", y_test.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "Remember that the MNIST data is an array of 28-pixel by 28-pixel \"images\" (brightness values), 60k in the training set, 10k in the test set."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "plt.figure(figsize=(8,4))\n",
    "\n",
    "for i in range(3):\n",
    "    plt.subplot(1,3,i+1)\n",
    "    plt.imshow(X_train[i], cmap='gray', interpolation='none')\n",
    "    plt.title(\"Label: {}\".format(y_train[i]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "### Preprocessing and normalization\n",
    "\n",
    "If you recall from the last 4C event, the first step we mostly used in preprocessing this data is to unroll the 2D arrays into a single vector.\n",
    "\n",
    "Then, as with many other optimizations, we'll see better results (with faster convergence) if we standardize the data into a smaller range. This can be done in a number of ways, like `sklearn`'s `StandardScaler` (zero-mean, unit variance), or `Normalize` (scale to unit-norm). For now, we'll just rescale to the range 0-1.\n",
    "\n",
    "Then, we also need to one-hot encode our labels again."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# unroll 2D pixel data into 1D vector\n",
    "X_train = X_train.reshape(60000, 784)\n",
    "X_test = X_test.reshape(10000, 784)\n",
    "\n",
    "# convert from original range (0-255) to 0-1 \n",
    "X_train = X_train / X_train.max()\n",
    "X_test = X_test / X_test.max()\n",
    "\n",
    "# OHE the y arrays\n",
    "ohe_y_train = pd.get_dummies(y_train).values\n",
    "ohe_y_test = pd.get_dummies(y_test).values"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we'll built another `Sequential` model. \n",
    "\n",
    "This time, we'll use the more commonly-used `relu` (\"rectified linear unit\") activation function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "np.random.seed(seed)\n",
    "\n",
    "model = Sequential([\n",
    "            Dense(512, input_dim=784, activation='relu'),\n",
    "            Dense(512, activation='relu'),\n",
    "            Dense(10, activation='softmax')\n",
    "        ])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The shape of this network is now: 784 (input nodes) => 512 (hidden nodes) => 512 (hidden nodes) => 10 (output nodes). That's about $784*512*512*10 \\approx 2x10^9$ weights!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])\n",
    "\n",
    "model.fit(X_train, ohe_y_train, batch_size=128, nb_epoch=5, verbose=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "loss, metrics = model.evaluate(X_test, ohe_y_test, verbose=1)\n",
    "\n",
    "print()\n",
    "#print('Test loss:', loss)\n",
    "print('Test accuracy:', metrics)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you recall the 2015 4C leaderboard, a score of 98% would have put you in the top 10% of submissions! \n",
    "\n",
    "Speaking only for myself, the entries that I submitted in that range took **much** more time and effort than those last few notebook cells!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 3: Regression (Housing)\n",
    "\n",
    "Finally, let's do an example of modeling a continuous variable - a regresssion task. We'll use another of the canned datasets: the Boston housing price data.\n",
    "\n",
    "This data comprises a few hundred observations of neighborhoods, each of thirteen related features. The target is the median price of the homes in that area (in thousands of dollars). So, this means that the output variable is a continuous, real (and positive) number.\n",
    "\n",
    "You can uncomment the `print(...'DESCR')` cell for a longer description."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from sklearn.datasets import load_boston\n",
    "from sklearn.linear_model import LinearRegression\n",
    "from sklearn.metrics import r2_score, mean_squared_error\n",
    "from sklearn.preprocessing import MinMaxScaler, StandardScaler"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# load + inspect data\n",
    "boston = load_boston()\n",
    "\n",
    "X = boston.data\n",
    "y = boston.target\n",
    "labels = boston.feature_names\n",
    "\n",
    "b_df = pd.DataFrame(X, columns=labels)\n",
    "\n",
    "b_df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# built-in information about the dataset and features\n",
    "#print(boston.get(\"DESCR\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Since the feature values span many orders of magnitude, we should standardize them for optimization efficiency. Then we can split the data into our train/test split.\n",
    "\n",
    "It's worth noting that we could also experiemnt with standardizing the output variable, as well. For now, we won't."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# standardize the feature data (all features now 0-1)\n",
    "scaler = MinMaxScaler(feature_range=(0, 1))\n",
    "X = scaler.fit_transform(X)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# train/test split\n",
    "X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.75, random_state=seed)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# build model\n",
    "np.random.seed(seed)\n",
    "\n",
    "model = Sequential([\n",
    "            # use a single hidden layer, also with 13 nodes\n",
    "            Dense(13, input_dim=13, activation='relu'),\n",
    "            Dense(1)\n",
    "        ])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# compile + fit model\n",
    "model.compile(loss='mean_squared_error', optimizer='rmsprop', metrics=['accuracy'])\n",
    "\n",
    "model.fit(X_train, y_train, batch_size=5, nb_epoch=100, verbose=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# evaluate on test data\n",
    "loss, metrics = model.evaluate(X_test, y_test, verbose=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "#print('Test loss:', loss)\n",
    "#print('Test accuracy:', metrics)\n",
    "print('MSE:', metrics)\n",
    "\n",
    "y_pred = model.predict(X_test)\n",
    "print('R^2 score:', r2_score(y_test, y_pred))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "plt.figure(figsize=(8,8))\n",
    "\n",
    "# compare the predictions to test\n",
    "plt.plot(y_test, y_pred, 'o', alpha=0.75, label='model predictions')\n",
    "\n",
    "# draw a diagonal\n",
    "xy = np.linspace(min(y_test), max(y_test))\n",
    "plt.plot(xy, xy, '--', label='truth = pred')\n",
    "\n",
    "plt.title('3-layer NN')\n",
    "plt.xlabel('truth ($k)')\n",
    "plt.ylabel('prediction ($k)')\n",
    "plt.legend(loc='best')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "Cool! \n",
    "\n",
    "It looks like our model struggles a bit with high-valued observations. Something worth digging into if we were to work on optimizing this model for this task. \n",
    "\n",
    "# BUT\n",
    "\n",
    "Just to remind you that this is a toy problem that probably *shouldn't* be solved with a neural network, let's look at the corresponding linear regression model.\n",
    "\n",
    "We use the same data...."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "model = LinearRegression()\n",
    "\n",
    "model.fit(X_train, y_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "y_pred = model.predict(X_test)\n",
    "\n",
    "print('R^2:', r2_score(y_test, y_pred))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And get similar $R^2$ values with a much more interpretable model. We can compare the prediction errors to the same chart from before..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "plt.figure(figsize=(8,8))\n",
    "\n",
    "# compare the predictions to test\n",
    "plt.plot(y_test, y_pred, 'o', alpha=0.75, label='model predictions')\n",
    "\n",
    "# draw the diagonal\n",
    "xy = np.linspace(min(y_test), max(y_test))\n",
    "plt.plot(xy, xy, '--', label='truth = pred')\n",
    "\n",
    "plt.title('Linear Regression')\n",
    "plt.xlabel('truth ($k)')\n",
    "plt.ylabel('prediction ($k)')\n",
    "plt.legend(loc='best')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And - **the reason why a linear model should often be preferred** - we can just look straight at the feature coefficients and read off how they relate to the predictions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "plt.figure(figsize=(8,8))\n",
    "\n",
    "# where to position the bars/ticks\n",
    "locs = range(len(model.coef_))\n",
    "\n",
    "plt.barh(locs, model.coef_, align='center')\n",
    "plt.yticks(locs, b_df.columns);\n",
    "\n",
    "plt.title('linear regression coefficients')\n",
    "plt.xlabel('value')\n",
    "plt.ylabel('coefficient')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Wrap Up\n",
    "\n",
    "Hopefully between [Part 1](https://github.com/DrSkippy/Data-Science-45min-Intros/tree/master/neural-networks-101), and now this - Part 2 - you've gained a bit deeper understanding for how neural networks work, and how to use Keras to build and train them. \n",
    "\n",
    "At this point, the only thing we haven't *really* illustrated is how to use them at the #bigdata scale (or with unconventional data types)  where they have proven particularly valuable. Perhaps there will be a Part 3...\n",
    "\n",
    "\n",
    "## What next?\n",
    "\n",
    "If you want to follow up (or go deeper) on the concepts that we covered, here are some links\n",
    "  \n",
    "- [What optimizer should I use?](http://sebastianruder.com/optimizing-gradient-descent/index.html#visualizationofalgorithms)\n",
    "- [What loss function should I use?](https://keras.io/getting-started/sequential-model-guide/#compilation)\n",
    "    - unfortunately, these are examples and not a rigorous discussion\n",
    "- [Keras FAQ](https://keras.io/getting-started/faq/)\n",
    "- [Keras' collection of pre-made examples](https://github.com/fchollet/keras/tree/master/examples)\n",
    "- [`sklearn` Keras wrappers](https://keras.io/scikit-learn-api/)\n",
    "    - allow you to mix in things from `sklearn` like `Pipeline`, `GridSearch`, etc.\n",
    "- [Telling Theano to use a GPU](http://deeplearning.net/software/theano/tutorial/using_gpu.html)\n",
    "\n",
    "  \n",
    "## Acknowledgements\n",
    "\n",
    "In addition to the links already given, most of this notebook was cobbled together based on other examples I found online, including:\n",
    "- [many MLM posts](http://machinelearningmastery.com/blog/)\n",
    "- [Fastforward Labs' `keras-hello-world`](https://github.com/fastforwardlabs/keras-hello-world)\n",
    "- [wxs' `keras-mnist-tutorial`](https://github.com/wxs/keras-mnist-tutorial)\n",
    "- and probably others..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
